MACS 30100
University of Chicago
Minimize the within-cluster variation
\[\min_{C_1, C_2, \dots, C_K} \left\{ \sum_{k = 1}^K W(C_k) \right\}\]
Squared Euclidean distance
\[W(C_k) = \frac{1}{|C_k|} \sum_{i,i' \in C_k} \sum_{j = 1}^p (x_{ij} - x_{i'j})^2\]
Global vs. local optimum
\[Z_1 = \phi_{11}X_1 + \phi_{21}X_2 + \dots + \phi_{p1}X_p\]
\[\sum_{j=1}^p \phi_{j1}^2 = 1\]
\[\max_{\phi_{11}, \dots, \phi_{p1}} \left \{ \frac{1}{n} \sum_{i=1}^n z_{i1}^2 \right \}\]
Such that \(\sum_{j=1}^p \phi_{j1}^2 = 1\)
USArrests a abandoned abc ability able about above abroad absorbed absorbing abstract
43 0 0 0 0 10 0 0 0 0 1
NYTimes## [1] "penchant" "brought" "structure" "willing" "yielding"
## [6] "bare" "school" "halls" "challenge" "step"
## [11] "largest" "lovers" "intense" "borders" "mall"
## [16] "classic" "conducted" "mirrors" "hole" "location"
## [21] "desperate" "published" "head" "paints" "another"
## [26] "starts" "familiar" "window" "thats" "broker"
NYTimesNYTimesNYTimesNYTimes## List of 1
## $ legend.position: chr "none"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
Rinse and repeat
## <<DocumentTermMatrix (documents: 2246, terms: 10134)>>
## Non-/sparse entries: 259208/22501756
## Sparsity : 99%
## Maximal term length: 18
## Weighting : term frequency (tf)